<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<title>GenomeTools - manual page for GT-HOP(1)</title>
<link rel="stylesheet" type="text/css" href="style.css">
<link rel="stylesheet" href="..//style.css" type="text/css" />
</head>
<body>
<div id="menu">
<ul>
<li><a href="../index.html">Overview</a></li>
<li><a href="https://github.com/genometools/genometools/releases">Releases</a></li>
<li><a href="pub/">Archive</a></li>
<li><a href="https://github.com/genometools/genometools">Browse source</a></li>
<li><a href="http://github.com/genometools/genometools/issues/">Issue tracker</a></li>
<li><a href="../documentation.html">Documentation</a></li>
  <ul class="submenu">
    <li><a id="current" href="../tools.html">Tools</a></li>
    <li><a href="../manuals.html">Manuals</a></li>
    <li><a href="../libgenometools.html">C API</a></li>
    <li><a href="../docs.html"><tt>gtscript</tt> docs</a></li>
    <li><a href="../contract.html">Development Contract</a></li>
    <li><a href="../contribute.html">Contribute</a></li>
  </ul>
<li><a href="../annotationsketch.html"><tt>AnnotationSketch</tt></a></li>
<li><a href="../cgi-bin/gff3validator.cgi">GFF3 validator</a></li>
<li><a href="../license.html">License</a></li>
</ul>
</div>
<div id="main">
<div class="sect1">
<h2 id="_name">NAME</h2>
<div class="sectionbody">
<div class="paragraph"><p>gt-hop - Cognate sequence-based homopolymer error correction.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_synopsis">SYNOPSIS</h2>
<div class="sectionbody">
<div class="paragraph"><p><strong>gt hop</strong> -&lt;mode&gt; -c &lt;encseq&gt; -map &lt;sam/bam&gt; -reads &lt;fastq&gt; [options&#8230;]</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_description">DESCRIPTION</h2>
<div class="sectionbody">
<div class="dlist"><dl>
<dt class="hdlist1">
<strong>-c</strong> [<em>string</em>]
</dt>
<dd>
<p>
cognate sequence
(encoded using gt encseq encode)
</p>
</dd>
<dt class="hdlist1">
<strong>-map</strong> [<em>string</em>]
</dt>
<dd>
<p>
mapping of reads to the cognate sequence
it must be in SAM/BAM format, and sorted by coordinate
(can be prepared e.g. using: samtools sort)
</p>
</dd>
<dt class="hdlist1">
<strong>-sam</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
mapping file is SAM
default: BAM
</p>
</dd>
<dt class="hdlist1">
<strong>-aggressive</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
correct as much as possible
</p>
</dd>
<dt class="hdlist1">
<strong>-moderate</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
mediate between sensitivity and precision
</p>
</dd>
<dt class="hdlist1">
<strong>-conservative</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
correct only most likely errors
</p>
</dd>
<dt class="hdlist1">
<strong>-expert</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
manually select correction criteria
</p>
</dd>
<dt class="hdlist1">
<strong>-reads</strong> 
</dt>
<dd>
<p>
uncorrected read file(s) in FastQ format;
the corrected reads are output in the currect working directory in files which are named as the input files, each prepended by a prefix (see -outprefix option)
-reads allows one to output the reads in the same order as in the input and is mandatory if the SAM contains more than a single primary alignment for each read (e.g. output of bwasw)
see also -o option as an alternative
</p>
</dd>
<dt class="hdlist1">
<strong>-outprefix</strong> [<em>string</em>]
</dt>
<dd>
<p>
prefix for output filenames (corrected reads)when -reads is specified
the prefix is prepended to each input filename (default: hop_)
</p>
</dd>
<dt class="hdlist1">
<strong>-o</strong> [<em>string</em>]
</dt>
<dd>
<p>
output file for corrected reads
(see also -reads/-outprefix) if -o is used, reads are output in a single file in the order they are found in the SAM file (which usually differ from the original order)
this will only work if the reads were aligned with a software which only includes 1 alignment for each read (e.g. bwa) (default: undefined)
</p>
</dd>
<dt class="hdlist1">
<strong>-hmin</strong> [<em>value</em>]
</dt>
<dd>
<p>
minimal homopolymer length in cognate sequence (default: 3)
</p>
</dd>
<dt class="hdlist1">
<strong>-read-hmin</strong> [<em>value</em>]
</dt>
<dd>
<p>
minimal homopolymer length in reads (default: 2)
</p>
</dd>
<dt class="hdlist1">
<strong>-qmax</strong> [<em>value</em>]
</dt>
<dd>
<p>
maximal average quality of homopolymer in a read (default: 120)
</p>
</dd>
<dt class="hdlist1">
<strong>-altmax</strong> [<em>value</em>]
</dt>
<dd>
<p>
max support of alternate homopol. length;
e.g. 0.8 means: do not correct any read if homop. length in more than 80%% of the reads has the same value, different from the cognate
if altmax is set to 1.0 reads are always corrected (default: 0.800000)
</p>
</dd>
<dt class="hdlist1">
<strong>-cogmin</strong> [<em>value</em>]
</dt>
<dd>
<p>
min support of cognate sequence homopol. length;
e.g. 0.1 means: do not correct any read if cognate homop. length is not present in at least 10%% of the reads
if cogmin is set to 0.0 reads are always corrected
</p>
</dd>
<dt class="hdlist1">
<strong>-mapqmin</strong> [<em>value</em>]
</dt>
<dd>
<p>
minimal mapping quality (default: 21)
</p>
</dd>
<dt class="hdlist1">
<strong>-covmin</strong> [<em>value</em>]
</dt>
<dd>
<p>
minimal coverage;
e.g. 5 means: do not correct any read if coverage (number of reads mapped over whole homopolymer) is less than 5
if covmin is set to 1 reads are always corrected (default: 1)
</p>
</dd>
<dt class="hdlist1">
<strong>-allow-muliple</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
allow multiple corrections in a read (default: no)
</p>
</dd>
<dt class="hdlist1">
<strong>-clenmax</strong> [<em>value</em>]
</dt>
<dd>
<p>
maximal correction length
default: unlimited
</p>
</dd>
<dt class="hdlist1">
<strong>-ann</strong> [<em>string</em>]
</dt>
<dd>
<p>
annotation of cognate sequence
it must be sorted by coordinates on the cognate sequence
(this can be e.g. done using: gt gff3 -sort)
if -ann is used, corrections will be limited to homopolymers startingor ending inside the feature type indicated by -ft optionformat: sorted GFF3 (default: undefined)
</p>
</dd>
<dt class="hdlist1">
<strong>-ft</strong> [<em>string</em>]
</dt>
<dd>
<p>
feature type to use when -ann option is specified (default: CDS)
</p>
</dd>
<dt class="hdlist1">
<strong>-v</strong> [<em>yes|no</em>]
</dt>
<dd>
<p>
be verbose (default: no)
</p>
</dd>
<dt class="hdlist1">
<strong>-help</strong> 
</dt>
<dd>
<p>
display help for basic options and exit
</p>
</dd>
<dt class="hdlist1">
<strong>-help+</strong> 
</dt>
<dd>
<p>
display help for all options and exit
</p>
</dd>
<dt class="hdlist1">
<strong>-version</strong> 
</dt>
<dd>
<p>
display version information and exit
</p>
</dd>
</dl></div>
<div class="paragraph"><p>Correction mode:</p></div>
<div class="paragraph"><p>One of the options <em>-aggressive</em>, <em>-moderate</em>, <em>-conservative</em> or <em>-expert</em>
must be selected.</p></div>
<div class="paragraph"><p>The <em>-aggressive</em>, <em>-moderate</em> and <em>-conservative</em> modes are presets of
the criteria by which it is decided if an observed discrepancy in
homopolymer length between cognate sequence and a read shall be corrected
or not. A description of the single criteria is provided by using
the <em>-help+</em>' option. The presets are equivalent to the following settings:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>                    -aggressive    -moderate      -conservative
-hmin               3              3              3
-read-hmin          1              1              2
-altmax             1.00           0.99           0.80
-refmin             0.00           0.00           0.10
-mapqmin            0              10             21
-covmin             1              1              1
-clenmax            unlimited      unlimited      unlimited
-allow-multiple     yes            yes            no</tt></pre>
</div></div>
<div class="paragraph"><p>The aggressive mode tries to maximize the sensitivity, the conservative
mode to minimize the false positives. An even more conservative set
of corrections can be achieved using the <em>-ann</em> option (see <em>-help+</em>).</p></div>
<div class="paragraph"><p>The <em>-expert</em> mode allows one to manually set each parameter; the default
values are the same as in the <em>-conservative</em> mode.</p></div>
<div class="paragraph"><p>(Finally, for evaluation purposes only, the <em>-state-of-truth</em> mode can be used:
this mode assumes that the sequenced genome has been specified
as cognate sequence and outputs an ideal list of corrections.)</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_reporting_bugs">REPORTING BUGS</h2>
<div class="sectionbody">
<div class="paragraph"><p>Report bugs to <a href="https://github.com/genometools/genometools/issues">https://github.com/genometools/genometools/issues</a>.</p></div>
</div>
</div>
<div id="footer">
Copyright &copy; 2007-2023 The <i>GenomeTools</i> authors.
</div>
</div>
</body>
</html>
